Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests

نویسندگان

  • Lucas Mentch
  • Giles Hooker
چکیده

This work develops formal statistical inference procedures for predictions generated by supervised learning ensembles. Ensemble methods based on bootstrapping, such as bagging and random forests, have improved the predictive accuracy of individual trees, but fail to provide a framework in which distributional results can be easily determined. Instead of aggregating full bootstrap samples, we consider predicting by averaging over trees built on subsamples of the training set and demonstrate that the resulting estimator takes the form of a U-statistic. As such, predictions for individual feature vectors are asymptotically normal, allowing for confidence intervals to accompany predictions. In practice, a subset of subsamples is used for computational speed; here our estimators take the form of incomplete U-statistics and equivalent results are derived. We further demonstrate that this setup provides a framework for testing the significance of features. Moreover, the internal estimation method we develop allows us to estimate the variance parameters and perform these inference procedures at no additional computational cost. Simulations and illustrations on a real data set are provided.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Joint Confidence Regions

Confidence intervals are one of the most important topics in mathematical statistics which are related to statistical hypothesis tests. In a confidence interval, the aim is that to find a random interval that coverage the unknown parameter with high probability. Confidence intervals and its different forms have been extensively discussed in standard statistical books. Since the most of stati...

متن کامل

Bayes Interval Estimation on the Parameters of the Weibull Distribution for Complete and Censored Tests

A method for constructing confidence intervals on parameters of a continuous probability distribution is developed in this paper. The objective is to present a model for an uncertainty represented by parameters of a probability density function.  As an application, confidence intervals for the two parameters of the Weibull distribution along with their joint confidence interval are derived. The...

متن کامل

Probabilistic Approaches to Better Quantifying the Results of Epidemiologic Studies

Typical statistical analysis of epidemiologic data captures uncertainty due to random sampling variation, but ignores more systematic sources of variation such as selection bias, measurement error, and unobserved confounding. Such sources are often only mentioned via qualitative caveats, perhaps under the heading of 'study limitations.' Recently, however, there has been considerable interest an...

متن کامل

Statistical Uncertainty Estimation Using Random Forests and Its Application to Drought Forecast

Drought is part of natural climate variability and ranks the first natural disaster in the world. Drought forecasting plays an important role in mitigating impacts on agriculture and water resources. In this study, a drought forecast model based on the random forest method is proposed to predict the time series of monthly standardized precipitation index SPI . We demonstrate model application b...

متن کامل

LINEAR HYPOTHESIS TESTING USING DLR METRIC

Several practical problems of hypotheses testing can be under a general linear model analysis of variance which would be examined. In analysis of variance, when the response random variable Y , has linear relationship with several random variables X, another important model as analysis of covariance can be used. In this paper, assuming that Y is fuzzy and using DLR metric, a method for testing ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2016